What are Anomalies?

Get an introduction of anomalies in a dataset, and understand the usage of mean and standard deviation in identifying them.

We'll cover the following

Introduction
Mathematical foundation

Introduction#

An anomaly in a data series is a significant deviation from some reasonable value. Looking at this series of numbers. For example, which number stands out?

The number that stands out in this series is 12.

Scatter plot for the series

This is intuitive to a human, but computer programs do not have that intuition…

Mathematical foundation#

To find the anomaly in the series, we first need to define a reasonable value and then define how far away we consider a significant deviation from this value:

Enter to Rename, Shift+Enter to Preview

The mean is ~4.33.
Next, we need to define the deviation. Let’s use Standard Deviation:

Enter to Rename, Shift+Enter to Preview

Standard deviation is the square root of the variance, which is the average squared distance from the mean. In this case, it is 3.08.

Now that we have defined a “reasonable” value and a deviation, we can define a range of acceptable values:

Enter to Rename, Shift+Enter to Preview

The range we defined is one standard deviation from the mean. Any value outside this range is considered an anomaly:

Enter to Rename, Shift+Enter to Preview

Using the query, we found that the value 12 is outside the range of acceptable values and identified it as an anomaly.

Understanding Z-score

Mark as Completed

Report an Issue

Detecting Anomalies

Analyzing A Server Log

Backtesting

Improving Accuracy

Conclusion

What are Anomalies?

Introduction#

Mathematical foundation#